70 research outputs found

    ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment

    Full text link
    The objective of stylized speech-driven facial animation is to create animations that encapsulate specific emotional expressions. Existing methods often depend on pre-established emotional labels or facial expression templates, which may limit the necessary flexibility for accurately conveying user intent. In this research, we introduce a technique that enables the control of arbitrary styles by leveraging natural language as emotion prompts. This technique presents benefits in terms of both flexibility and user-friendliness. To realize this objective, we initially construct a Text-Expression Alignment Dataset (TEAD), wherein each facial expression is paired with several prompt-like descriptions.We propose an innovative automatic annotation method, supported by Large Language Models (LLMs), to expedite the dataset construction, thereby eliminating the substantial expense of manual annotation. Following this, we utilize TEAD to train a CLIP-based model, termed ExpCLIP, which encodes text and facial expressions into semantically aligned style embeddings. The embeddings are subsequently integrated into the facial animation generator to yield expressive and controllable facial animations. Given the limited diversity of facial emotions in existing speech-driven facial animation training data, we further introduce an effective Expression Prompt Augmentation (EPA) mechanism to enable the animation generator to support unprecedented richness in style control. Comprehensive experiments illustrate that our method accomplishes expressive facial animation generation and offers enhanced flexibility in effectively conveying the desired style

    ResLT: Residual Learning for Long-tailed Recognition

    Full text link
    Deep learning algorithms face great challenges with long-tailed data distribution which, however, is quite a common case in real-world scenarios. Previous methods tackle the problem from either the aspect of input space (re-sampling classes with different frequencies) or loss space (re-weighting classes with different weights), suffering from heavy over-fitting to tail classes or hard optimization during training. To alleviate these issues, we propose a more fundamental perspective for long-tailed recognition, {i.e., from the aspect of parameter space, and aims to preserve specific capacity for classes with low frequencies. From this perspective, the trivial solution utilizes different branches for the head, medium, tail classes respectively, and then sums their outputs as the final results is not feasible. Instead, we design the effective residual fusion mechanism -- with one main branch optimized to recognize images from all classes, another two residual branches are gradually fused and optimized to enhance images from medium+tail classes and tail classes respectively. Then the branches are aggregated into final results by additive shortcuts. We test our method on several benchmarks, {i.e., long-tailed version of CIFAR-10, CIFAR-100, Places, ImageNet, and iNaturalist 2018. Experimental results manifest that our method achieves new state-of-the-art for long-tailed recognition. Code will be available at \url{https://github.com/FPNAS/ResLT}

    Generalized Parametric Contrastive Learning

    Full text link
    In this paper, we propose the Generalized Parametric Contrastive Learning (GPaCo/PaCo) which works well on both imbalanced and balanced data. Based on theoretical analysis, we observe that supervised contrastive loss tends to bias high-frequency classes and thus increases the difficulty of imbalanced learning. We introduce a set of parametric class-wise learnable centers to rebalance from an optimization perspective. Further, we analyze our GPaCo/PaCo loss under a balanced setting. Our analysis demonstrates that GPaCo/PaCo can adaptively enhance the intensity of pushing samples of the same class close as more samples are pulled together with their corresponding centers and benefit hard example learning. Experiments on long-tailed benchmarks manifest the new state-of-the-art for long-tailed recognition. On full ImageNet, models from CNNs to vision transformers trained with GPaCo loss show better generalization performance and stronger robustness compared with MAE models. Moreover, GPaCo can be applied to the semantic segmentation task and obvious improvements are observed on the 4 most popular benchmarks. Our code is available at https://github.com/dvlab-research/Parametric-Contrastive-Learning.Comment: TPAMI 2023. arXiv admin note: substantial text overlap with arXiv:2107.1202

    Pressure-induced spin reorientation transition in layered ferromagnetic insulator Cr2Ge2Te6

    Full text link
    Anisotropic magnetoresistance (AMR) of Cr2Ge2Te6 (CGT), a layered ferromagnetic insulator, is investigated under an applied hydrostatic pressure up to 2 GPa. The easy axis direction of the magnetization is inferred from the AMR saturation feature in the presence and absence of the applied pressure. At zero applied pressure, the easy axis is along the c-direction or perpendicular to the layer. Upon application of a hydrostatic pressure>1 GPa, the uniaxial anisotropy switches to easy-plane anisotropy which drives the equilibrium magnetization from the c-axis to the ab-plane at zero magnetic field, which amounts to a giant magnetic anisotropy energy change (>100%). As the temperature is increased across the Curie temperature, the characteristic AMR effect gradually decreases and disappears. Our first-principles calculations confirm the giant magnetic anisotropy energy change with moderate pressure and assign its origin to the increased off-site spin-orbit interaction of Te atoms due to a shorter Cr-Te distance. Such a pressure-induced spin reorientation transition is very rare in three-dimensional ferromagnets, but it may be common to other layered ferromagnets with similar crystal structures to CGT, and therefore offers a unique way to control magnetic anisotropy
    • …
    corecore